Explicit plugins #3916

Erotemic · 2025-11-07T23:13:08Z

This partially addresses the issue I had in #3915. The main missing piece is documentation updates that shows the user how they can define a new run spec independent of modifying HELM source.

The main idea is add an argument --plugin which can be a list of module names or paths to python modules. The new behavior is that these specified modules are imported at the start of the run_benchmarking function, so custom run specs will be found.

Importing modules based on their importable python name is fairly straight forward, and sometimes useful, however, I would almost always prefer to specify a path to a plugin file so it is explicit what custom logic I'm using. Unfortunately the stdlib doesn't have a mechanism to import a module from its path, but I've been maintaining one in my ubelt library for years, and I simply ported the logic of that into helm/common/import_utils.py. (The way it works is by temporarily adding the module's directory to the PYTHONPATH and then using normal import mechanisms).

If maintainers like the basic idea then I'll go ahead and update the documentation and tests in this PR as well.

Erotemic · 2026-01-08T21:49:04Z

I used codex to:

refactor this to remove the vendered code and just depend on ubelt (this significantly simplifies the diff)
add tests that the different ways to register plugins are respected
add documentation for how to accomplish this

Codex task: https://chatgpt.com/codex/tasks/task_b_69334998af5c832db233d6de61a5e75d

Erotemic · 2026-01-12T16:24:30Z

@yifanmai I think this PR is done / ready for review. On a high level this PR:

Introduces a standardized entry-point style plugin (similar to what pytest uses)
Introduces an explicit --plugin command line argument for easy experimentation. (Adds ubelt dependency for path support, so very minimal new code)
Adds unit tests for each of these new features
Overhauls docs/importing_custom_modules.md which now contains a step-by-step example.

I also fixed a typo in docs/developer_adding_new_models.md.

yifanmai

Thanks for the fixes; this is really great! I think the new approach is much more user-friendly, and the tutorial is very helpful for explaining how to set things up. I have some high level feedback that I'd like to discuss first:

Generally, the new documentation tends to conflate making things importable (e.g. scenarios, metrics) vs importing things (e.g. run spec functions). The documentation seems to assume that the user already knows how importability works when discussing how to import run spec functions. You could consider moving the PYTHONPATH section earlier in the documentation, and give a more complete explanation of how importability works in Python (using PYTHONPATH or installed package, which are the two methods we use here). You could also reuse some text from the old doc.
- The tutorial puts the scenario and the run spec function in the same file. I think it would be better to clarify that only the run spec function needs to be imported.
- With method 1, it should be clearer that project.entry-points.helm only needs to include modules with run spec functions.
- With method 2, you should specify how the importable module path is determined when using a file path with the argument.
- Methods 3 requires using the PYTHONPATH method, and method 4 implicitly changes PYTHONPATH; this should be made clearer.
Relatedly, I think there are a few plausible alternatives for naming. I thought that --plugin was vague, and also the documentation never defines what a plugin is. Alternatives: --import-modules is more explicit in explaining what the argument does. --run-spec-module is more explicit about what the user will use the argument for.

What do you think?

yifanmai · 2026-01-13T19:24:19Z

docs/importing_custom_modules.md

+
+### 4) A Python wrapper script (when you don't want to use `helm-run`)
+
+There is no need to use the `helm-run` entry point, you can instead write a Python wrapper script that calls `helm.benchmark.run.run_benchmark()`. Python will automatically add the directory containing that script to the Python module search path. If your custom classes live in a Python module under that directory, they will automatically be importable by Python. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details.


Nit: "There is no need to use the helm-run entry point, you" -> "There is no need to use the helm-run entry point. You"

yifanmai · 2026-01-13T20:47:42Z

src/helm/benchmark/run.py

 from typing import List, Optional


+import ubelt as ub


Nit: single newline before and after third-party import:

from typing import List, Optional import ubelt as ub from helm.benchmark import model_metadata_registry

yifanmai · 2026-01-16T05:03:10Z

docs/importing_custom_modules.md


-If you are using a Python wrapper script that calls `helm.benchmark.run.run_benchmark()` instead of using `helm-run`, Python will automatically add the directory containing that script to the Python module search path. If your custom classes live in a Python module under that directory, they will automatically be importable by Python. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details.
+```bash
+helm-run --plugins my_package.helm_plugin /path/to/local_plugin.py ...


I found this example confusing. It misleads the user into thinking that the flag requires a module path and a file path, and then maps the file to the module, which is not what it does. I think you should use separate examples for the two different syntaxes i.e.

helm-run --plugins my_package.helm_plugin_a my_package.helm_plugin_b...

and

helm-run --plugins /path/to/local_plugin_a.py /path/to/local_plugin_b.py ...

Erotemic · 2026-01-16T14:47:11Z

Thanks for the fixes; this is really great! I think the new approach is much more user-friendly, and the tutorial is very helpful for explaining how to set things up. I have some high level feedback that I'd like to discuss first:

Generally, the new documentation tends to conflate making things importable (e.g. scenarios, metrics) vs importing things (e.g. run spec functions).

Yeah, this was a confusion I had when learning how to use HELM. It seems like there are two different plugin systems, and halfway through writing the new docs I realized that it is only the run spec that needs to be pre-imported, and new classes work as long as their qualified names are somehow importable. If I was writing a system I likely would have chosen one or the other: everything is registered via decorator (or metaclass inheritance) or, everything can be specified by fully qualified importable name. I think I'm shoehorning the run-spec into this documentation because I want a single doc to have an answer to the question: "how do I get HELM to see my code?". Maybe there is a more subtle distinction that I'm not appreciating, but it felt like I could lump both of these under the umbrella term "plugin", and give the user a conceptual grounding point before going into the differences between run specs and custom modules.

You could consider moving the PYTHONPATH section earlier in the documentation, ... reuse some text from the old doc.

That makes sense.

The tutorial puts the scenario and the run spec function in the same file. I think it would be better to clarify that only the run spec function needs to be imported.

Makes sense. I think there is some value in pre-importing the custom classes. It gives you an early error if you have a syntax problem. But it still could be clarified.

With method 1, it should be clearer that project.entry-points.helm only needs to include modules with run spec functions.

With method 2, you should specify how the importable module path is determined when using a file path with the argument.

Methods 3 requires using the PYTHONPATH method, and method 4 implicitly changes PYTHONPATH; this should be made clearer.

These all make sense.

Relatedly, I think there are a few plausible alternatives for naming. I thought that --plugin was vague, and also the documentation never defines what a plugin is. Alternatives: --import-modules is more explicit in explaining what the argument does. --run-spec-module is more explicit about what the user will use the argument for.

What do you think?

I want to argue for keeping plugin.

I could define a "plugin" as any user-specified code that is explicitly or implicitly registered with HELM. (implicit meaning that its qualified path resolves to code that HELM can use). I think plugin is the conceptually the better term here, even though you're right it only does matter for run specs. However, I could envision a future where custom modules could be registered explicitly with decorators (if we wanted to move towards a unified way of handling user code) and there is also a world where run specs could be implicitly specified via qualified paths to the function (I use this method in my pipeline tool kwdagger).

I think plugin has a connotation of: this is my code, this is how I point to it. And even though it's not necessary for the classes, it does pre-ensure they are defined and have a fully qualified name. From your suggestions I would pick --import-modules, but the issue I have with that is it doesn't communicate why you want to import these modules. What is the purpose? To me plugin communicates that you will import something as well as the purpose of bringing external code with new features into the application.

If you're not convinced, I'll change the name, but that's my take on it.

EDIT: GPT Suggested the name --plugin-modules if you like that better. It also suggested adding an alias with --import-modules, but I'm not sure about that idea.

Erotemic · 2026-01-16T21:53:02Z

@yifanmai I believe I've addressed the core concerns.

Kept most language from the original docs
Moved the importability and PYTHONPATH section earlier.
Split classes (importable by name) vs run specs (must be imported).
Clarified entry points should list run spec modules.
Split --plugins examples into module-name vs file-path, and gave a note import via file path.

yifanmai · 2026-01-21T01:01:32Z

Looks good, thanks! Could you run the linter (doc)? Install the dev extra and then run ./pre-commit.sh.

yifanmai · 2026-01-21T01:03:38Z

docs/importing_custom_modules.md

+
+### 4) A Python wrapper script (when you don't want to use `helm-run`)
+
+There is no need to use the `helm-run` entry point. You can instead write a Python wrapper script that calls `helm.benchmark.run.run_benchmark()`.


Nit: The first sentence is awkward. I would suggest replacing it with something like: "Instead of using the helm-run entry point, you can instead write your own Python wrapper script that calls..."

yifanmai · 2026-01-22T17:57:28Z

Will merge this and fix lint errors later, for expediency.

Erotemic · 2026-01-22T18:12:57Z

@yifanmai I was just about to fix the issues. Are you taking care of that or do you want me to still run it with a new PR?

yifanmai · 2026-01-22T18:23:24Z

@Erotemic no problem, I will handle the issues.

Erotemic added 2 commits November 7, 2025 16:59

Add argument to specify explicit plugins

e4f4126

Add argument to specify explicit plugins

f635cd0

yifanmai mentioned this pull request Nov 22, 2025

Question: Specifying a custom set of metrics to helm-run. #3915

Open

Erotemic added 2 commits December 5, 2025 12:29

simplify utils

51a1588

Improve plugin entry point tests

34f77a1

Erotemic added 6 commits January 9, 2026 12:57

Refine docs

25c3dae

Document plugin types and extend plugin tests

d5c19ed

fix typo

de2c23f

indent the test code

02ad568

Rework importing custom module docs

d6b47d0

doc tweaks

1f4fdf9

yifanmai reviewed Jan 16, 2026

View reviewed changes

Erotemic added 4 commits January 16, 2026 09:49

fix 3rd party newline style, no import alias

2577388

Sentence split

1d54c36

Update docs based on review

daf3841

Update file path wording

9538b93

yifanmai reviewed Jan 21, 2026

View reviewed changes

dmjoy mentioned this pull request Jan 21, 2026

Add entry_point plugin support for user defined HELM run_specs AIQ-Kitware/aiq-magnet#42

Merged

yifanmai approved these changes Jan 22, 2026

View reviewed changes

yifanmai merged commit f40beac into stanford-crfm:main Jan 22, 2026
3 of 6 checks passed


		### 4) A Python wrapper script (when you don't want to use `helm-run`)

		There is no need to use the `helm-run` entry point, you can instead write a Python wrapper script that calls `helm.benchmark.run.run_benchmark()`. Python will automatically add the directory containing that script to the Python module search path. If your custom classes live in a Python module under that directory, they will automatically be importable by Python. See [Python's documentation](https://docs.python.org/3/library/sys_path_init.html) for more details.

Explicit plugins #3916

Explicit plugins #3916

Conversation

Erotemic commented Nov 7, 2025

Uh oh!

Erotemic commented Jan 8, 2026

Uh oh!

Erotemic commented Jan 12, 2026

Uh oh!

yifanmai left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifanmai Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

yifanmai Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

yifanmai Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Erotemic commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Erotemic commented Jan 16, 2026

Uh oh!

yifanmai commented Jan 21, 2026

Uh oh!

yifanmai Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

yifanmai commented Jan 22, 2026

Uh oh!

Uh oh!

Erotemic commented Jan 22, 2026

Uh oh!

yifanmai commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yifanmai left a comment •

edited

Loading

Erotemic commented Jan 16, 2026 •

edited

Loading